Update DI to CU converter for GA #131

chienyuanchang · 2026-01-16T21:40:22Z

Purpose

...

Does this introduce a breaking change?

[ ] Yes
[ ] No

Pull Request Type

What kind of change does this Pull Request introduce?

[ ] Bugfix
[ ] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

How to Test

Get the code

git clone [repo-address]
cd [repo-name]
git checkout [branch-name]
npm install

Test the code

What to Check

Verify that the following are valid

...

Other Information

chienyuanchang · 2026-01-20T17:52:57Z

python/di_to_cu_migration_tool/cu_converter_generative.py

            elif value_type == "number":
                try:
-                    di_label["valueNumber"] = float(value.get("content"))  # content can be easily converted to a float
+                    content_val = value.get("content")


Hi @aainav269, I encountered some errors when I tried to convert fields labeled by region in DI studio which would not have content. I'm wondering if we encountered this error before and if we are good to set value as None.

I do remember seeing these region fields before, but only in DI 3.1. I think we decided to just ignore these region fields when converting to CU.

I see. I need to set the value to None to avoid the errors.

python/di_to_cu_migration_tool/get_ocr.py

chienyuanchang · 2026-01-20T18:21:13Z

python/di_to_cu_migration_tool/cu_converter_generative.py


 # imports from same project
-from constants import CU_API_VERSION, MAX_FIELD_LENGTH, VALID_CU_FIELD_TYPES
+from constants import CU_API_VERSION, MAX_FIELD_LENGTH, VALID_CU_FIELD_TYPES, COMPLETION_MODEL, EMBEDDING_MODEL


Hi @aainav269, I found we only validate the length of field name and do not check/normalize the field name by our current field limitation. It seems like we also don't check/remove the field format. Do you recall the discussion of field name normalization in this tool?

Yes, we decided then to remove the fields that exceed the field name length. One point of discussion was if we shorten the field name, could there be another field with that name? Ex: if we have ...._Yes and ...._No and we shorten both, it would be ....

I don't think we ever validated the field format. I think we assumed that if the field was already generated by DI, the format would apply to CU as well. What are you thinking of enforcing for this?

CU has more limitations on field name than DI like no white spaces and only underscores and no other symbols. If we didn't ignore this intentionally. I will add some logics to do the validation and modification.

python/di_to_cu_migration_tool/.sample_env

aainav269 · 2026-01-21T22:54:45Z

python/di_to_cu_migration_tool/get_ocr.py


   # Set the global variables
-    api_version = os.getenv("API_VERSION")
+    api_version = os.getenv("API_VERSION") or CU_API_VERSION


what if the os environment doesn't match the api-version?

This code will try to get API_VERSION in the env first. If it's none, the program will use default CU_API_VERSION. We can also consider always using the latest default api version instead of letting the user input it.

aainav269 · 2026-01-21T22:55:31Z

python/di_to_cu_migration_tool/get_ocr.py

-                "Apim-Subscription-id": f"{subscription_key}",
-                "Content-Type": "application/pdf",
+                "Ocp-Apim-Subscription-Key": f"{subscription_key}",
+                "Content-Type": "application/octet-stream",


why are we changing the content type while keeping the content the same?

Not sure about the same content you mentioned. We may have image files. Do we only support pdf files in this tool?

Chien Yuan Chang added 2 commits January 16, 2026 13:39

first version

c0d1406

small improvements

939cf00

chienyuanchang requested a review from aainav269 January 20, 2026 17:50

chienyuanchang commented Jan 20, 2026

View reviewed changes

python/di_to_cu_migration_tool/get_ocr.py Show resolved Hide resolved

chienyuanchang commented Jan 20, 2026

View reviewed changes

python/di_to_cu_migration_tool/get_ocr.py Show resolved Hide resolved

chienyuanchang commented Jan 20, 2026

View reviewed changes

aainav269 reviewed Jan 21, 2026

View reviewed changes

python/di_to_cu_migration_tool/.sample_env Show resolved Hide resolved

aainav269 reviewed Jan 21, 2026

View reviewed changes

aainav269 approved these changes Jan 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update DI to CU converter for GA #131

Update DI to CU converter for GA #131

Uh oh!

chienyuanchang commented Jan 16, 2026

Uh oh!

chienyuanchang Jan 20, 2026

Uh oh!

aainav269 Jan 21, 2026

Uh oh!

chienyuanchang Jan 21, 2026

Uh oh!

Uh oh!

Uh oh!

chienyuanchang Jan 20, 2026

Uh oh!

aainav269 Jan 21, 2026

Uh oh!

chienyuanchang Jan 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

aainav269 Jan 21, 2026

Uh oh!

chienyuanchang Jan 21, 2026

Uh oh!

aainav269 Jan 21, 2026

Uh oh!

chienyuanchang Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Update DI to CU converter for GA #131

Are you sure you want to change the base?

Update DI to CU converter for GA #131

Uh oh!

Conversation

chienyuanchang commented Jan 16, 2026

Purpose

Does this introduce a breaking change?

Pull Request Type

How to Test

What to Check

Other Information

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chienyuanchang Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chienyuanchang Jan 21, 2026 •

edited

Loading